[D] How LLMs are changing search
submitted by /u/firef1y1
[link] [comments]
( 8
min )
Systematic reviews are vital for guiding practice, research, and policy, yet
they are often slow and labour-intensive. Large language models (LLMs) could
offer a way to speed up and automate systematic reviews, but their performance
in such tasks has not been comprehensively evaluated against humans, and no
study has tested GPT-4, the biggest LLM so far. This pre-registered study
evaluates GPT-4's capability in title/abstract screening, full-text review, and
data extraction across various literature types and languages using a
'human-out-of-the-loop' approach. Although GPT-4 had accuracy on par with human
performance in most tasks, results were skewed by chance agreement and dataset
imbalance. After adjusting for these, there was a moderate level of performance
for data extraction, and - barring studies that used highly reliable prompts -
screening performance levelled at none to moderate for different stages and
languages. When screening full-text literature using highly reliable prompts,
GPT-4's performance was 'almost perfect.' Penalising GPT-4 for missing key
studies using highly reliable prompts improved its performance even more. Our
findings indicate that, currently, substantial caution should be used if LLMs
are being used to conduct systematic reviews, but suggest that, for certain
systematic review tasks delivered under reliable prompts, LLMs can rival human
performance.
( 3
min )
We study the problem of designing adaptive multi-armed bandit algorithms that
perform optimally in both the stochastic setting and the adversarial setting
simultaneously (often known as a best-of-both-world guarantee). A line of
recent works shows that when configured and analyzed properly, the
Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the
adversarial setting, can in fact optimally adapt to the stochastic setting as
well. Such results, however, critically rely on an assumption that there exists
one unique optimal arm. Recently, Ito (2021) took the first step to remove such
an undesirable uniqueness assumption for one particular FTRL algorithm with the
$\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly
improve and generalize this result, showing that uniqueness is unnecessary for
FTRL with a broad family of regularizers and a new learning rate schedule. For
some regularizers, our regret bounds also improve upon prior results even when
uniqueness holds. We further provide an application of our results to the
decoupled exploration and exploitation problem, demonstrating that our
techniques are broadly applicable.
( 3
min )
Graph neural networks (GNNs) have become compelling models designed to
perform learning and inference on graph-structured data. However, little work
has been done to understand the fundamental limitations of GNNs for scaling to
larger graphs and generalizing to out-of-distribution (OOD) inputs. In this
paper, we use a random graph generator to systematically investigate how the
graph size and structural properties affect the predictive performance of GNNs.
We present specific evidence that the average node degree is a key feature in
determining whether GNNs can generalize to unseen graphs, and that the use of
multiple node update functions can improve the generalization performance of
GNNs when dealing with graphs of multimodal degree distributions. Accordingly,
we propose a multi-module GNN framework that allows the network to adapt
flexibly to new graphs by generalizing a single canonical nonlinear
transformation over aggregated inputs. Our results show that the multi-module
GNNs improve the OOD generalization on a variety of inference tasks in the
direction of diverse structural features.
( 2
min )
Stochastic gradient descent (SGD) algorithm is the method of choice in many
machine learning tasks thanks to its scalability and efficiency in dealing with
large-scale problems. In this paper, we focus on the shuffling version of SGD
which matches the mainstream practical heuristics. We show the convergence to a
global solution of shuffling SGD for a class of non-convex functions under
over-parameterized settings. Our analysis employs more relaxed non-convex
assumptions than previous literature. Nevertheless, we maintain the desired
computational complexity as shuffling SGD has achieved in the general convex
setting.
( 2
min )
We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank
weight matrices when training deep neural networks. Our results show that
training neural networks with mini-batch SGD and weight decay causes a bias
towards rank minimization over the weight matrices. Specifically, we show, both
theoretically and empirically, that this bias is more pronounced when using
smaller batch sizes, higher learning rates, or increased weight decay.
Additionally, we predict and observe empirically that weight decay is necessary
to achieve this bias. Unlike previous literature, our analysis does not rely on
assumptions about the data, convergence, or optimality of the weight matrices
and applies to a wide range of neural network architectures of any width or
depth. Finally, we empirically investigate the connection between this bias and
generalization, finding that it has a marginal effect on generalization.
( 2
min )
This research underscores the efficacy of Fourier topological optimization in
refining MRI imagery, thereby bolstering the classification precision of
Alzheimer's Disease through convolutional neural networks. Recognizing that MRI
scans are indispensable for neurological assessments, but frequently grapple
with issues like blurriness and contrast irregularities, the deployment of
Fourier topological optimization offered enhanced delineation of brain
structures, ameliorated noise, and superior contrast. The applied techniques
prioritized boundary enhancement, contrast and brightness adjustments, and
overall image lucidity. Employing CNN architectures VGG16, ResNet50,
InceptionV3, and Xception, the post-optimization analysis revealed a marked
elevation in performance. Conclusively, the amalgamation of Fourier topological
optimization with CNNs delineates a promising trajectory for the nuanced
classification of Alzheimer's Disease, portending a transformative impact on
its diagnostic paradigms.
( 2
min )
As large language models (LLMs) are widely adopted, new safety issues and
policies emerge, to which existing safety classifiers do not generalize well.
If we have only observed a few examples of violations of a new safety rule, how
can we build a classifier to detect violations? In this paper, we study the
novel setting of domain-generalized few-shot learning for LLM-based text safety
classifiers. Unlike prior few-shot work, these new safety issues can be hard to
uncover and we do not get to choose the few examples. We demonstrate that
existing few-shot techniques do not perform well in this setting, and rather we
propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting
training data based on similar examples in prior existing rules. We empirically
show that our approach of similarity-based data-augmentation + prompt-tuning
(DAPT) consistently outperforms baselines that either do not rely on data
augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral
judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule
is loosely correlated with existing ones.
( 2
min )
A fundamental problem of causal discovery is cause-effect inference, learning
the correct causal direction between two random variables. Significant progress
has been made through modelling the effect as a function of its cause and a
noise term, which allows us to leverage assumptions about the generating
function class. The recently introduced heteroscedastic location-scale noise
functional models (LSNMs) combine expressive power with identifiability
guarantees. LSNM model selection based on maximizing likelihood achieves
state-of-the-art accuracy, when the noise distributions are correctly
specified. However, through an extensive empirical evaluation, we demonstrate
that the accuracy deteriorates sharply when the form of the noise distribution
is misspecified by the user. Our analysis shows that the failure occurs mainly
when the conditional variance in the anti-causal direction is smaller than that
in the causal direction. As an alternative, we find that causal model selection
through residual independence testing is much more robust to noise
misspecification and misleading conditional variance.
( 2
min )
Cohen et al. (2021) empirically study the evolution of the largest eigenvalue
of the loss Hessian, also known as sharpness, along the gradient descent (GD)
trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness
increases at the early phase of training (referred to as progressive
sharpening), and eventually saturates close to the threshold of $2 /
\text{(step size)}$. In this paper, we start by demonstrating through empirical
studies that when the EoS phenomenon occurs, different GD trajectories (after a
proper reparameterization) align on a specific bifurcation diagram independent
of initialization. We then rigorously prove this trajectory alignment
phenomenon for a two-layer fully-connected linear network and a single-neuron
nonlinear network trained with a single data point. Our trajectory alignment
analysis establishes both progressive sharpening and EoS phenomena,
encompassing and extending recent findings in the literature.
( 2
min )
We study the problem of designing adaptive multi-armed bandit algorithms that
perform optimally in both the stochastic setting and the adversarial setting
simultaneously (often known as a best-of-both-world guarantee). A line of
recent works shows that when configured and analyzed properly, the
Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the
adversarial setting, can in fact optimally adapt to the stochastic setting as
well. Such results, however, critically rely on an assumption that there exists
one unique optimal arm. Recently, Ito (2021) took the first step to remove such
an undesirable uniqueness assumption for one particular FTRL algorithm with the
$\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly
improve and generalize this result, showing that uniqueness is unnecessary for
FTRL with a broad family of regularizers and a new learning rate schedule. For
some regularizers, our regret bounds also improve upon prior results even when
uniqueness holds. We further provide an application of our results to the
decoupled exploration and exploitation problem, demonstrating that our
techniques are broadly applicable.
( 3
min )
Agglomerative hierarchical clustering based on Ordered Weighted Averaging
(OWA) operators not only generalises the single, complete, and average
linkages, but also includes intercluster distances based on a few nearest or
farthest neighbours, trimmed and winsorised means of pairwise point
similarities, amongst many others. We explore the relationships between the
famous Lance-Williams update formula and the extended OWA-based linkages with
weights generated via infinite coefficient sequences. Furthermore, we provide
some conditions for the weight generators to guarantee the resulting
dendrograms to be free from unaesthetic inversions.
( 2
min )
We describe a new direct method to estimate bipartite mutual information of a
classical spin system based on Monte Carlo sampling enhanced by autoregressive
neural networks. It allows studying arbitrary geometries of subsystems and can
be generalized to classical field theories. We demonstrate it on the Ising
model for four partitionings, including a multiply-connected even-odd division.
We show that the area law is satisfied for temperatures away from the critical
temperature: the constant term is universal, whereas the proportionality
coefficient is different for the even-odd partitioning.
( 2
min )
Graph generative model evaluation necessitates understanding differences
between graphs on the distributional level. This entails being able to harness
salient attributes of graphs in an efficient manner. Curvature constitutes one
such property that has recently proved its utility in characterising graphs.
Its expressive properties, stability, and practical utility in model evaluation
remain largely unexplored, however. We combine graph curvature descriptors with
emerging methods from topological data analysis to obtain robust, expressive
descriptors for evaluating graph generative models.
( 2
min )
Generative diffusion models have achieved spectacular performance in many
areas of generative modeling. While the fundamental ideas behind these models
come from non-equilibrium physics, in this paper we show that many aspects of
these models can be understood using the tools of equilibrium statistical
mechanics. Using this reformulation, we show that generative diffusion models
undergo second-order phase transitions corresponding to symmetry breaking
phenomena. We argue that this lead to a form of instability that lies at the
heart of their generative capabilities and that can be described by a set of
mean field critical exponents. We conclude by analyzing recent work connecting
diffusion models and associative memory networks in view of the thermodynamic
formulations.
( 2
min )
Flexible models for probability distributions are an essential ingredient in
many machine learning tasks. We develop and investigate a new class of
probability distributions, which we call a Squared Neural Family (SNEFY),
formed by squaring the 2-norm of a neural network and normalising it with
respect to a base measure. Following the reasoning similar to the well
established connections between infinitely wide neural networks and Gaussian
processes, we show that SNEFYs admit closed form normalising constants in many
cases of interest, thereby resulting in flexible yet fully tractable density
models. SNEFYs strictly generalise classical exponential families, are closed
under conditioning, and have tractable marginal distributions. Their utility is
illustrated on a variety of density estimation, conditional density estimation,
and density estimation with missing data tasks.
( 2
min )
Neural additive models (NAMs) can improve the interpretability of deep neural
networks by handling input features in separate additive sub-networks. However,
they lack inherent mechanisms that provide calibrated uncertainties and enable
selection of relevant features and interactions. Approaching NAMs from a
Bayesian perspective, we enhance them in three primary ways, namely by a)
providing credible intervals for the individual additive sub-networks; b)
estimating the marginal likelihood to perform an implicit selection of features
via an empirical Bayes procedure; and c) enabling a ranking of feature pairs as
candidates for second-order interaction in fine-tuned models. In particular, we
develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical
performance on tabular datasets and challenging real-world medical tasks.
( 2
min )
Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for
post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle
is to greedily minimize the kernelized Stein discrepancy (KSD), which only
requires the gradient of the log-target distribution, and is thus well-suited
for Bayesian inference. The main advantages of Stein thinning are the automatic
remove of the burn-in period, the correction of the bias introduced by recent
MCMC algorithms, and the asymptotic properties of convergence towards the
target distribution. Nevertheless, Stein thinning suffers from several
empirical pathologies, which may result in poor approximations, as observed in
the literature. In this article, we conduct a theoretical analysis of these
pathologies, to clearly identify the mechanisms at stake, and suggest improved
strategies. Then, we introduce the regularized Stein thinning algorithm to
alleviate the identified pathologies. Finally, theoretical guarantees and
extensive experiments show the high efficiency of the proposed algorithm. An
implementation of regularized Stein thinning as the kernax library in python
and JAX is available at https://gitlab.com/drti/kernax.
( 3
min )
The out-of-sample error (OO) is the main quantity of interest in risk
estimation and model selection. Leave-one-out cross validation (LO) offers a
(nearly) distribution-free yet computationally demanding approach to estimate
OO. Recent theoretical work showed that approximate leave-one-out cross
validation (ALO) is a computationally efficient and statistically reliable
estimate of LO (and OO) for generalized linear models with differentiable
regularizers. For problems involving non-differentiable regularizers, despite
significant empirical evidence, the theoretical understanding of ALO's error
remains unknown. In this paper, we present a novel theory for a wide class of
problems in the generalized linear model family with non-differentiable
regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such
as the size of leave-i-out perturbations in active sets, sample size n, number
of features p and regularization parameters. As a consequence, for the
$\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes
to infinity while n/p and SNR are fixed and bounded.
( 2
min )
Many real-world domains require safe decision making in uncertain
environments. In this work, we introduce a deep reinforcement learning
framework for approaching this important problem. We consider a distribution
over transition models, and apply a risk-averse perspective towards model
uncertainty through the use of coherent distortion risk measures. We provide
robustness guarantees for this framework by showing it is equivalent to a
specific class of distributionally robust safe reinforcement learning problems.
Unlike existing approaches to robustness in deep reinforcement learning,
however, our formulation does not involve minimax optimization. This leads to
an efficient, model-free implementation of our approach that only requires
standard data collection from a single training environment. In experiments on
continuous control tasks with safety constraints, we demonstrate that our
framework produces robust performance and safety at deployment time across a
range of perturbed test environments.
( 2
min )
We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank
weight matrices when training deep neural networks. Our results show that
training neural networks with mini-batch SGD and weight decay causes a bias
towards rank minimization over the weight matrices. Specifically, we show, both
theoretically and empirically, that this bias is more pronounced when using
smaller batch sizes, higher learning rates, or increased weight decay.
Additionally, we predict and observe empirically that weight decay is necessary
to achieve this bias. Unlike previous literature, our analysis does not rely on
assumptions about the data, convergence, or optimality of the weight matrices
and applies to a wide range of neural network architectures of any width or
depth. Finally, we empirically investigate the connection between this bias and
generalization, finding that it has a marginal effect on generalization.
( 2
min )
Generative artificial intelligence is transforming how enterprises do business. Organizations are using AI to improve data-driven decisions, enhance omnichannel experiences, and drive next-generation product development. Enterprises are using generative AI specifically to power their marketing efforts through emails, push notifications, and other outbound communication channels. Gartner predicts that “by 2025, 30% of outbound marketing messages […]
( 8
min )
Visualization is vital for understanding complex data, but existing tools require “tidy data,” adding extra steps. Learn how Data Formulator transforms concepts into visuals, promoting collaboration between analysts and AI agents.
The post Data Formulator: A concept-driven, AI-powered approach to data visualization appeared first on Microsoft Research.
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Drupal is a content management software. It’s used to make many […]
( 7
min )
This is a guest post by Jose Benitez, Founder and Director of AI and Mattias Ponchon, Head of Infrastructure at Intuitivo. Intuitivo, a pioneer in retail innovation, is revolutionizing shopping with its cloud-based AI and machine learning (AI/ML) transactional processing system. This groundbreaking technology enables us to operate millions of autonomous points of purchase (A-POPs) […]
( 8
min )
Enterprises seek to harness the potential of Machine Learning (ML) to solve complex problems and improve outcomes. Until recently, building and deploying ML models required deep levels of technical and coding skills, including tuning ML models and maintaining operational pipelines. Since its introduction in 2021, Amazon SageMaker Canvas has enabled business analysts to build, deploy, […]
( 8
min )
Researchers are taking deep learning for a deep dive, literally. The Woods Hole Oceanographic Institution (WHOI) Autonomous Robotics and Perception Laboratory (WARPLab) and MIT are developing a robot for studying coral reefs and their ecosystems. The WARPLab autonomous underwater vehicle (AUV), enabled by an NVIDIA Jetson Orin NX module, is an effort from the world’s Read article >
( 8
min )
The cloud is full of treats this GFN Thursday with Cities: Skylines II now streaming, leading 15 newly supported games this week. The game’s publisher, Paradox Interactive, is offering GeForce NOW one-month Priority memberships for those who pick up the game first, so make sure to grab one before they’re gone. Among the newly supported Read article >
( 7
min )
This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software. For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data […]
The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.
( 10
min )
Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed […]
( 12
min )
In this issue: Kosmos-2.5: A Multimodal Literate Model; Can vine copulas explain complex relationships of weather variables; New system accelerates the adaptive training process; Structural inequalities and relational labor in the influencer industry.
The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.
( 10
min )
NVIDIA researchers are collaborating with academic centers worldwide to advance generative AI, robotics and the natural sciences — and more than a dozen of these projects will be shared at NeurIPS, one of the world’s top AI conferences. Set for Dec. 10-16 in New Orleans, NeurIPS brings together experts in generative AI, machine learning, computer Read article >
( 8
min )
In today’s information age, the vast volumes of data housed in countless documents present both a challenge and an opportunity for businesses. Traditional document processing methods often fall short in efficiency and accuracy, leaving room for innovation, cost-efficiency, and optimizations. Document processing has witnessed significant advancements with the advent of Intelligent Document Processing (IDP). With […]
( 20
min )
This post is co-authored by Dhurjati Brahma, Senior Systems Architect at T-Mobile US, Inc and Jim Chao, Principal Engineer/Architect at T-Mobile US, Inc and Nicholas Zellerhoff Associate Systems Architect at T-Mobile US, Inc. T-Mobile US, Inc. provides a Voicemail to Text service to its customers, which allows customers to quickly read through their voicemails and […]
( 7
min )
Written by Venkata Nori and Kshitij Gopali. Introduction As technology is evolving, most companies in the world are adopting advanced mechanisms for their daily tasks of storing/updating data, project management & tracking, incident management, version control, etc. Periodically, these companies’ business stakeholders would want to extract and analyze the data to see how the business… Read More »Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques
The post Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques appeared first on Data Science Central.
( 25
min )
A recent interview by Medical Device Network with GlobalData medical analyst Alexandra Murdoch shares interesting insights into cybersecurity for medical devices.
The post How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare appeared first on Data Science Central.
( 22
min )
Visual effects artist Surfaced Studio returns to 'In the NVIDIA Studio' to share his real-world VFX project, created on a brand new Razer Blade 16 Mercury Edition laptop powered by GeForce RTX 4080 graphics.
( 8
min )
This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com. Founded in 2011, Talent.com is one of the world’s largest sources of employment. The company combines paid job listings from their clients with public job listings into a single searchable platform. With over 30 million jobs listed […]
( 12
min )
Images such as those in Google Street View are taking on a new purpose in the hands of University of Florida Assistant Professor of Artificial Intelligence Chaofeng Wang. He’s using them, along with deep learning, in a research project to automate the evaluation of urban buildings. The project aims to help governments mitigate natural disaster Read article >
( 6
min )
The 15th Kendall Square Association annual meeting explored new and old aspects of the neighborhood.
( 9
min )
Pulsar timing arrays (PTAs) perform Bayesian posterior inference with
expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing
residuals each, producing a posterior distribution for the stochastic
gravitational wave background (SGWB) can take days to a week. The computational
bottleneck arises because the likelihood evaluation required for MCMC is
extremely costly when considering the dimensionality of the search space.
Fortunately, generating simulated data is fast, so modern simulation-based
inference techniques can be brought to bear on the problem. In this paper, we
demonstrate how conditional normalizing flows trained on simulated data can be
used for extremely fast and accurate estimation of the SGWB posteriors,
reducing the sampling time from weeks to a matter of seconds.
( 2
min )
Randomized experimental comparisons of alternative pedagogical strategies
could provide useful empirical evidence in instructors' decision-making.
However, traditional experiments do not have a clear and simple pathway to
using data rapidly to try to increase the chances that students in an
experiment get the best conditions. Drawing inspiration from the use of machine
learning and experimentation in product development at leading technology
companies, we explore how adaptive experimentation might help in continuous
course improvement. In adaptive experiments, as different arms/conditions are
deployed to students, data is analyzed and used to change the experience for
future students. This can be done using machine learning algorithms to identify
which actions are more promising for improving student experience or outcomes.
This algorithm can then dynamically deploy the most effective conditions to
future students, resulting in better support for students' needs. We illustrate
the approach with a case study providing a side-by-side comparison of
traditional and adaptive experimentation of self-explanation prompts in online
homework problems in a CS1 course. This provides a first step in exploring the
future of how this methodology can be useful in bridging research and practice
in doing continuous improvement.
( 2
min )
Graph neural networks (GNNs) have gained significant popularity due to the
powerful capability to extract useful representations from graph data. As the
need for efficient GNN computation intensifies, a variety of programming
abstractions designed for optimizing GNN Aggregation have emerged to facilitate
acceleration. However, there is no comprehensive evaluation and analysis upon
existing abstractions, thus no clear consensus on which approach is better. In
this letter, we classify existing programming abstractions for GNN Aggregation
by the dimension of data organization and propagation method. By constructing
these abstractions on a state-of-the-art GNN library, we perform a thorough and
detailed characterization study to compare their performance and efficiency,
and provide several insights on future GNN acceleration based on our analysis.
( 2
min )
The performance of neural networks has been significantly improved by
increasing the number of channels in convolutional layers. However, this
increase in performance comes with a higher computational cost, resulting in
numerous studies focused on reducing it. One promising approach to address this
issue is group convolution, which effectively reduces the computational cost by
grouping channels. However, to the best of our knowledge, there has been no
theoretical analysis on how well the group convolution approximates the
standard convolution. In this paper, we mathematically analyze the
approximation of the group convolution to the standard convolution with respect
to the number of groups. Furthermore, we propose a novel variant of the group
convolution called balanced group convolution, which shows a higher
approximation with a small additional computational cost. We provide
experimental results that validate our theoretical findings and demonstrate the
superior performance of the balanced group convolution over other variants of
group convolution.
( 2
min )
Molecular language modeling is an effective approach to generating novel
chemical structures. However, these models do not \emph{a priori} encode
certain preferences a chemist may desire. We investigate the use of fine-tuning
using Direct Preference Optimization to better align generated molecules with
chemist preferences. Our findings suggest that this approach is simple,
efficient, and highly effective.
( 2
min )
On-device machine learning (ML) enables the training process to exploit a
massive amount of user-generated private data samples. To enjoy this benefit,
inter-device communication overhead should be minimized. With this end, we
propose federated distillation (FD), a distributed model training algorithm
whose communication payload size is much smaller than a benchmark scheme,
federated learning (FL), particularly when the model size is large. Moreover,
user-generated data samples are likely to become non-IID across devices, which
commonly degrades the performance compared to the case with an IID dataset. To
cope with this, we propose federated augmentation (FAug), where each device
collectively trains a generative model, and thereby augments its local data
towards yielding an IID dataset. Empirical studies demonstrate that FD with
FAug yields around 26x less communication overhead while achieving 95-98% test
accuracy compared to FL.
( 2
min )
This work introduces the first toolkit around path-norms that is fully able
to encompass general DAG ReLU networks with biases, skip connections and any
operation based on the extraction of order statistics: max pooling, GroupSort
etc. This toolkit notably allows us to establish generalization bounds for
modern neural networks that are not only the most widely applicable path-norm
based ones, but also recover or beat the sharpest known bounds of this type.
These extended path-norms further enjoy the usual benefits of path-norms: ease
of computation, invariance under the symmetries of the network, and improved
sharpness on feedforward networks compared to the product of operators' norms,
another complexity measure most commonly used.
The versatility of the toolkit and its ease of implementation allow us to
challenge the concrete promises of path-norm-based generalization bounds, by
numerically evaluating the sharpest known bounds for ResNets on ImageNet.
( 2
min )
A metric tensor for Riemann manifold Monte Carlo particularly suited for
non-linear Bayesian hierarchical models is proposed. The metric tensor is built
from symmetric positive semidefinite log-density gradient covariance (LGC)
matrices, which are also proposed and further explored here. The LGCs
generalize the Fisher information matrix by measuring the joint information
content and dependence structure of both a random variable and the parameters
of said variable. Consequently, positive definite Fisher/LGC-based metric
tensors may be constructed not only from the observation likelihoods as is
current practice, but also from arbitrarily complicated non-linear prior/latent
variable structures, provided the LGC may be derived for each conditional
distribution used to construct said structures. The proposed methodology is
highly automatic and allows for exploitation of any sparsity associated with
the model in question. When implemented in conjunction with a Riemann manifold
variant of the recently proposed numerical generalized randomized Hamiltonian
Monte Carlo processes, the proposed methodology is highly competitive, in
particular for the more challenging target distributions associated with
Bayesian hierarchical models.
( 2
min )
Inference on modern Bayesian Neural Networks (BNNs) often relies on a
variational inference treatment, imposing violated assumptions of independence
and the form of the posterior. Traditional MCMC approaches avoid these
assumptions at the cost of increased computation due to its incompatibility to
subsampling of the likelihood. New Piecewise Deterministic Markov Process
(PDMP) samplers permit subsampling, though introduce a model specific
inhomogenous Poisson Process (IPPs) which is difficult to sample from. This
work introduces a new generic and adaptive thinning scheme for sampling from
these IPPs, and demonstrates how this approach can accelerate the application
of PDMPs for inference in BNNs. Experimentation illustrates how inference with
these methods is computationally feasible, can improve predictive accuracy,
MCMC mixing performance, and provide informative uncertainty measurements when
compared against other approximate inference schemes.
( 2
min )
Compressing neural networks is a key step when deploying models for real-time
or embedded applications. Factorizing the model's matrices using low-rank
approximations is a promising method for achieving compression. While it is
possible to set the rank before training, this approach is neither flexible nor
optimal. In this work, we propose a post-training rank-selection method called
Rank-Tuning that selects a different rank for each matrix. Used in combination
with training adaptations, our method achieves high compression rates with no
or little performance degradation. Our numerical experiments on signal
processing tasks show that we can compress recurrent neural networks up to 14x
with at most 1.4% relative performance reduction.
( 2
min )
We study the performance of empirical risk minimization on the $p$-norm
linear regression problem for $p \in (1, \infty)$. We show that, in the
realizable case, under no moment assumptions, and up to a
distribution-dependent constant, $O(d)$ samples are enough to exactly recover
the target. Otherwise, for $p \in [2, \infty)$, and under weak moment
assumptions on the target and the covariates, we prove a high probability
excess risk bound on the empirical risk minimizer whose leading term matches,
up to a constant that depends only on $p$, the asymptotically exact rate. We
extend this result to the case $p \in (1, 2)$ under mild assumptions that
guarantee the existence of the Hessian of the risk at its minimizer.
( 2
min )
We initiate a novel approach to explain the out of sample performance of
random forest (RF) models by exploiting the fact that any RF can be formulated
as an adaptive weighted K nearest-neighbors model. Specifically, we use the
proximity between points in the feature space learned by the RF to re-write
random forest predictions exactly as a weighted average of the target labels of
training data points. This linearity facilitates a local notion of
explainability of RF predictions that generates attributions for any model
prediction across observations in the training set, and thereby complements
established methods like SHAP, which instead generates attributions for a model
prediction across dimensions of the feature space. We demonstrate this approach
in the context of a bond pricing model trained on US corporate bond trades, and
compare our approach to various existing approaches to model explainability.
( 2
min )
Molecular language modeling is an effective approach to generating novel
chemical structures. However, these models do not \emph{a priori} encode
certain preferences a chemist may desire. We investigate the use of fine-tuning
using Direct Preference Optimization to better align generated molecules with
chemist preferences. Our findings suggest that this approach is simple,
efficient, and highly effective.
( 2
min )
Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML […]
( 16
min )
This is a guest post co-written by Rama Badrinath, Divay Jindal and Utkarsh Agrawal at Meesho. Meesho is India’s fastest growing ecommerce company with a mission to democratize internet commerce for everyone and make it accessible to the next billion users of India. Meesho was founded in 2015 and today focuses on buyers and sellers […]
( 6
min )
GPU-powered surgical-simulation devices are helping train more than 2,000 doctors a year in lower-income countries to treat cataract blindness, the world’s leading cause of blindness, thanks to the nonprofit HelpMeSee. While cataract surgery has a success rate of around 99%, many patients in low- and middle-income countries lack access to the common procedure due to Read article >
( 6
min )
A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can. The stunning prestidigitation, showcased in the video above, is one of nearly 30 tasks that robots have learned to expertly Read article >
( 6
min )
Companies increasingly rely on user-generated images and videos for engagement. From ecommerce platforms encouraging customers to share product images to social media companies promoting user-generated videos and images, using user content for engagement is a powerful strategy. However, it can be challenging to ensure that this user-generated content is consistent with your policies and fosters […]
( 7
min )
High-resolution imagery is very prevalent in today’s world, from satellite imagery to drones and DLSR cameras. From this imagery, we can capture damage due to natural disasters, anomalies in manufacturing equipment, or very small defects such as defects on printed circuit boards (PCBs) or semiconductors. Building anomaly detection models using high-resolution imagery can be challenging […]
( 8
min )
Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII). To ensure customer privacy and maintain regulatory compliance while training, fine-tuning, and using deep learning models, […]
( 12
min )
To enable professionals worldwide to build and run AI applications right from their desktops, NVIDIA and AMD are powering a new line of workstations equipped with NVIDIA RTX Ada Generation GPUs and AMD Ryzen Threadripper PRO 7000 WX-Series CPUs. Bringing together the highest levels of AI computing, rendering and simulation capabilities, these new platforms enable Read article >
( 5
min )
Training generative AI models just got easier. NVIDIA DGX Cloud AI supercomputing platform and NVIDIA AI Enterprise software are now available in Oracle Cloud Marketplace, making it possible for Oracle Cloud Infrastructure customers to access high-performance accelerated computing and software to run secure, stable and supported production AI in just a few clicks. The addition Read article >
( 6
min )
Rush to the cloud — stream Counter-Strike 2 on GeForce NOW for the highest frame rates. Members can play through the newest chapter of Valve’s elite, competitive, first-person shooter from the cloud. It’s all part of an action-packed GFN Thursday, with 22 more games joining the cloud gaming platform’s library, including Hot Wheels Unleashed 2 Read article >
( 5
min )
We developed a safety mitigation stack to ready DALL·E 3 for wider release and are sharing updates on our provenance research.
( 3
min )
AI models that prioritize similarity falter when asked to design something completely new.
( 10
min )
The award honors research on public policy with a focus on economic and governmental reforms.
( 7
min )
Purina US, a subsidiary of Nestlé, has a long history of enabling people to more easily adopt pets through Petfinder, a digital marketplace of over 11,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, Petfinder has helped millions of pets find their forever homes. Purina consistently […]
( 9
min )
This position research paper was presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities. In the business world, measuring success is as critical as selecting the right […]
The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.
( 10
min )
Powerful generative AI models and cloud-native APIs and microservices are coming to the edge. Generative AI is bringing the power of transformer models and large language models to virtually every industry. That reach now includes areas that touch edge, robotics and logistics systems: defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions and Read article >
( 8
min )
Artificial intelligence is now a household term. Responsible AI is hot on its heels. Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous. In the latest episode of the NVIDIA AI Podcast, host Noah Read article >
( 6
min )
Real-time rendering, animation and texture baking are essential workflows for 3D art production. Using the Marmoset Toolbag software, 3D artists can enhance their creative workflows and build complex 3D models without disruptions to productivity.
( 7
min )
NVIDIA founder and CEO Jensen Huang joined Hon Hai (Foxconn) Chairman and CEO Young Liu to unveil the latest in their ongoing partnership to develop the next wave of intelligent electric vehicle (EV) platforms for the global automotive market. This latest move, announced today at the fourth annual Hon Hai Tech Day in Taiwan, will Read article >
( 6
min )
Amazon Pharmacy is a full-service pharmacy on Amazon.com that offers transparent pricing, clinical and customer support, and free delivery right to your door. Customer care agents play a crucial role in quickly and accurately retrieving information related to pharmacy information, including prescription clarifications and transfer status, order and dispensing details, and patient profile information, in […]
( 8
min )
At Amazon Web Services (AWS), not only are we passionate about providing customers with a variety of comprehensive technical solutions, but we’re also keen on deeply understanding our customers’ business processes. We adopt a third-party perspective and objective judgment to help customers sort out their value propositions, collect pain points, propose appropriate solutions, and create […]
( 16
min )
Amazon Personalize has launched a new integration with Amazon OpenSearch Service that enables you to personalize search results for each user and assists in predicting their search needs. The Amazon Personalize Search Ranking plugin within OpenSearch Service allows you to improve the end-user engagement and conversion from your website and app search by taking advantage […]
( 7
min )
GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.
( 7
min )
NVIDIA today announced an update to RTX Video Super Resolution (VSR) that delivers greater overall graphical fidelity with preserved details, upscaling for native videos and support for GeForce RTX 20 Series GPUs.
( 7
min )
Researchers coaxed a family of generative AI models to work together to solve multistep robot manipulation problems.
( 11
min )
Some researchers see formal specifications as a way for autonomous systems to "explain themselves" to humans. But a new study finds that we aren't understanding.
( 9
min )
Veriff is an identity verification platform partner for innovative growth-driven organizations, including pioneers in financial services, FinTech, crypto, gaming, mobility, and online marketplaces. In this post, we show you how Veriff standardized their model deployment workflow using Amazon SageMaker, reducing costs and development time.
( 8
min )
How trustworthy are generative pre-trained transformer (GPT) models? To answer this question, University of Illinois Urbana-Champaign, together with Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research, released a comprehensive trustworthiness evaluation platform for large language models (LLMs), which is presented in the recent paper: DecodingTrust: A Comprehensive Assessment of Trustworthiness […]
The post DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models appeared first on Microsoft Research.
( 11
min )
Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub.
The post Benchmarking Bit Errors in Quantized Neural Networks with PyTorch appeared first on David Stutz.
( 6
min )
Maximum-type statistics of certain functions of the sample covariance matrix
of high-dimensional vector time series are studied to statistically confirm or
reject the null hypothesis that a data set has been collected under normal
conditions. The approach generalizes the case of the maximal deviation of the
sample autocovariances function from its assumed values. Within a linear time
series framework it is shown that Gumbel-type extreme value asymptotics holds
true. As applications we discuss long-only mimimal-variance portfolio
optimization and subportfolio analysis with respect to idiosyncratic risks, ETF
index tracking by sparse tracking portfolios, convolutional deep learners for
image analysis and the analysis of array-of-sensors data.
( 2
min )
The exploration of transition state (TS) geometries is crucial for
elucidating chemical reaction mechanisms and modeling their kinetics. Recently,
machine learning (ML) models have shown remarkable performance for prediction
of TS geometries. However, they require 3D conformations of reactants and
products often with their appropriate orientations as input, which demands
substantial efforts and computational cost. Here, we propose a generative
approach based on the stochastic diffusion method, namely TSDiff, for
prediction of TS geometries just from 2D molecular graphs. TSDiff outperformed
the existing ML models with 3D geometries in terms of both accuracy and
efficiency. Moreover, it enables to sample various TS conformations, because it
learned the distribution of TS geometries for diverse reactions in training.
Thus, TSDiff was able to find more favorable reaction pathways with lower
barrier heights than those in the reference database. These results demonstrate
that TSDiff shows promising potential for an efficient and reliable TS
exploration.
( 2
min )
This paper introduces a novel model-agnostic algorithm called adaptive
ensemble batch multi-input multi-output conformalized quantile regression
(AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction
intervals for a fixed pre-specified miscoverage rate in a distribution-free
manner. Our method is grounded on conformal prediction principles, however, it
does not require data splitting and provides close to exact coverage even when
the data is not exchangeable. Moreover, the resulting prediction intervals,
besides being empirically valid along the forecast horizon, do not neglect
heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution
shifts, which means that its prediction intervals remain reliable over an
unlimited period of time, without entailing retraining or imposing unrealistic
strict assumptions on the data-generating process. Through methodically
experimentation, we demonstrate that our approach outperforms other competitive
methods on both real-world and synthetic datasets. The code used in the
experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the
following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.
( 3
min )
Reinforcement Learning (RL) environments can produce training data with
spurious correlations between features due to the amount of training data or
its limited feature coverage. This can lead to RL agents encoding these
misleading correlations in their latent representation, preventing the agent
from generalising if the correlation changes within the environment or when
deployed in the real world. Disentangled representations can improve
robustness, but existing disentanglement techniques that minimise mutual
information between features require independent features, thus they cannot
disentangle correlated features. We propose an auxiliary task for RL algorithms
that learns a disentangled representation of high-dimensional observations with
correlated features by minimising the conditional mutual information between
features in the representation. We demonstrate experimentally, using continuous
control tasks, that our approach improves generalisation under correlation
shifts, as well as improving the training performance of RL algorithms in the
presence of correlated features.
( 2
min )
Hierarchical time series are common in several applied fields. The forecasts
for these time series are required to be coherent, that is, to satisfy the
constraints given by the hierarchy. The most popular technique to enforce
coherence is called reconciliation, which adjusts the base forecasts computed
for each time series. However, recent works on probabilistic reconciliation
present several limitations. In this paper, we propose a new approach based on
conditioning to reconcile any type of forecast distribution. We then introduce
a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample
from the reconciled distribution. It can be used for any base forecast
distribution: discrete, continuous, or in the form of samples, providing a
major speedup compared to the current methods. Experiments on several temporal
hierarchies show a significant improvement over base probabilistic forecasts.
( 2
min )
Neural ordinary differential equations (neural ODEs) are a popular family of
continuous-depth deep learning models. In this work, we consider a large family
of parameterized ODEs with continuous-in-time parameters, which include
time-dependent neural ODEs. We derive a generalization bound for this class by
a Lipschitz-based argument. By leveraging the analogy between neural ODEs and
deep residual networks, our approach yields in particular a generalization
bound for a class of deep residual networks. The bound involves the magnitude
of the difference between successive weight matrices. We illustrate numerically
how this quantity affects the generalization capability of neural networks.
( 2
min )
Literature-Based Discovery (LBD) aims to discover new scientific knowledge by
mining papers and generating hypotheses. Standard LBD is limited to predicting
pairwise relations between discrete concepts (e.g., drug-disease links), and
ignores critical contexts like experimental settings (e.g., a specific patient
population where a drug is evaluated) and background motivations (e.g., to find
drugs without specific side effects). We address these limitations with a novel
formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in
natural language, while grounding them in a context that controls the
hypothesis search space. We present a modeling framework using retrieval of
``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4
tends to generate ideas with overall low technical depth and novelty, while our
inspiration prompting approaches partially mitigate this issue. Our work
represents a first step toward building language models that generate new ideas
derived from scientific literature.
( 2
min )
Evaluating the adversarial robustness of machine learning models using
gradient-based attacks is challenging. In this work, we show that
hyperparameter optimization can improve fast minimum-norm attacks by automating
the selection of the loss function, the optimizer and the step-size scheduler,
along with the corresponding hyperparameters. Our extensive evaluation
involving several robust models demonstrates the improved efficacy of fast
minimum-norm attacks when hyper-up with hyperparameter optimization. We release
our open-source code at https://github.com/pralab/HO-FMN.
( 2
min )
This paper presents a method to efficiently classify the gastroenterologic
section of images derived from Video Capsule Endoscopy (VCE) studies by
exploring the combination of a Convolutional Neural Network (CNN) for
classification with the time-series analysis properties of a Hidden Markov
Model (HMM). It is demonstrated that successive time-series analysis identifies
and corrects errors in the CNN output. Our approach achieves an accuracy of
$98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for
precise localization within the gastrointestinal (GI) tract while requiring
only approximately 1M parameters and thus, provides a method suitable for low
power devices
( 2
min )
Bayesian Optimization (BO) is typically used to optimize an unknown function
$f$ that is noisy and costly to evaluate, by exploiting an acquisition function
that must be maximized at each optimization step. Even if provably
asymptotically optimal BO algorithms are efficient at optimizing
low-dimensional functions, scaling them to high-dimensional spaces remains an
open problem, often tackled by assuming an additive structure for $f$. By doing
so, BO algorithms typically introduce additional restrictive assumptions on the
additive structure that reduce their applicability domain. This paper contains
two main contributions: (i) we relax the restrictive assumptions on the
additive structure of $f$, at the expense of weakening the maximization
guarantees of the acquisition function, and (ii) we address the
over-exploration problem for decentralized BO algorithms. To these ends, we
propose DumBO, an asymptotically optimal decentralized BO algorithm that
achieves very competitive performance against state-of-the-art BO algorithms,
especially when the additive structure of $f$ comprises high-dimensional
factors.
( 2
min )
Model identification of battery dynamics is a central problem in energy
research; many energy management systems and design processes rely on accurate
battery models for efficiency optimization. The standard methodology for
battery modelling is traditional design of experiments (DoE), where the battery
dynamics are excited with many different current profiles and the measured
outputs are used to estimate the system dynamics. However, although it is
possible to obtain useful models with the traditional approach, the process is
time consuming and expensive because of the need to sweep many different
current-profile configurations. In the present work, a novel DoE approach is
developed based on deep reinforcement learning, which alters the configuration
of the experiments on the fly based on the statistics of past experiments.
Instead of sticking to a library of predefined current profiles, the proposed
approach modifies the current profiles dynamically by updating the output space
covered by past measurements, hence only the current profiles that are
informative for future experiments are applied. Simulations and real
experiments are used to show that the proposed approach gives models that are
as accurate as those obtained with traditional DoE but by using 85\% less
resources.
( 2
min )
We consider the problem of model selection in a high-dimensional sparse
linear regression model under the differential privacy framework. In
particular, we consider the problem of differentially private best subset
selection and study its utility guarantee. We adopt the well-known exponential
mechanism for selecting the best model, and under a certain margin condition,
we establish its strong model recovery property. However, the exponential
search space of the exponential mechanism poses a serious computational
bottleneck. To overcome this challenge, we propose a Metropolis-Hastings
algorithm for the sampling step and establish its polynomial mixing time to its
stationary distribution in the problem parameters $n,p$, and $s$. Furthermore,
we also establish approximate differential privacy for the final estimates of
the Metropolis-Hastings random walk using its mixing property. Finally, we also
perform some illustrative simulations that echo the theoretical findings of our
main results.
( 2
min )
This paper proposes a set of novel optimization algorithms for solving a
class of convex optimization problems with time-varying streaming cost
function. We develop an approach to track the optimal solution with a bounded
error. Unlike the existing results, our algorithm is executed only by using the
first-order derivatives of the cost function which makes it computationally
efficient for optimization with time-varying cost function. We compare our
algorithms to the gradient descent algorithm and show why gradient descent is
not an effective solution for optimization problems with time-varying cost.
Several examples including solving a model predictive control problem cast as a
convex optimization problem with a streaming time-varying cost function
demonstrate our results.
( 2
min )
We investigate the problem of stochastic, combinatorial multi-armed bandits
where the learner only has access to bandit feedback and the reward function
can be non-linear. We provide a general framework for adapting discrete offline
approximation algorithms into sublinear $\alpha$-regret methods that only
require bandit feedback, achieving
$\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative
$\alpha$-regret dependence on the horizon $T$. The framework only requires the
offline algorithms to be robust to small errors in function evaluation. The
adaptation procedure does not even require explicit knowledge of the offline
approximation algorithm -- the offline algorithm can be used as a black box
subroutine. To demonstrate the utility of the proposed framework, the proposed
framework is applied to diverse applications in submodular maximization. The
new CMAB algorithms for submodular maximization with knapsack constraints
outperform a full-bandit method developed for the adversarial setting in
experiments with real-world data.
( 3
min )
Neural network pruning has shown to be an effective technique for reducing
the network size, trading desirable properties like generalization and
robustness to adversarial attacks for higher sparsity. Recent work has claimed
that adversarial pruning methods can produce sparse networks while also
preserving robustness to adversarial examples. In this work, we first
re-evaluate three state-of-the-art adversarial pruning methods, showing that
their robustness was indeed overestimated. We then compare pruned and dense
versions of the same models, discovering that samples on thin ice, i.e., closer
to the unpruned model's decision boundary, are typically misclassified after
pruning. We conclude by discussing how this intuition may lead to designing
more effective adversarial pruning methods in future work.
( 2
min )
Document-level relation extraction (DocRE) aims to extract relations of all
entity pairs in a document. A key challenge in DocRE is the cost of annotating
such data which requires intensive human effort. Thus, we investigate the case
of DocRE in a low-resource setting, and we find that existing models trained on
low data overestimate the NA ("no relation") label, causing limited
performance. In this work, we approach the problem from a calibration
perspective and propose PRiSM, which learns to adapt logits based on relation
semantic information. We evaluate our method on three DocRE datasets and
demonstrate that integrating existing models with PRiSM improves performance by
as much as 26.38 F1 score, while the calibration error drops as much as 36
times when trained with about 3% of data. The code is publicly available at
https://github.com/brightjade/PRiSM.
( 2
min )
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and
adaptively provides high-quality intrinsic rewards to enhance exploration in
reinforcement learning (RL). More specifically, AIRS selects shaping function
from a predefined set based on the estimated task return in real-time,
providing reliable exploration incentives and alleviating the biased objective
problem. Moreover, we develop an intrinsic reward toolkit to provide efficient
and reliable implementations of diverse intrinsic reward approaches. We test
AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite.
Extensive simulation demonstrates that AIRS can outperform the benchmarking
schemes and achieve superior performance with simple architecture.
( 2
min )
In the realm of robotics, numerous downstream robotics tasks leverage machine
learning methods for processing, modeling, or synthesizing data. Often, this
data comprises variables that inherently carry geometric constraints, such as
the unit-norm condition of quaternions representing rigid-body orientations or
the positive definiteness of stiffness and manipulability ellipsoids. Handling
such geometric constraints effectively requires the incorporation of tools from
differential geometry into the formulation of machine learning methods. In this
context, Riemannian manifolds emerge as a powerful mathematical framework to
handle such geometric constraints. Nevertheless, their recent adoption in robot
learning has been largely characterized by a mathematically-flawed
simplification, hereinafter referred to as the ``single tangent space fallacy".
This approach involves merely projecting the data of interest onto a single
tangent (Euclidean) space, over which an off-the-shelf learning algorithm is
applied. This paper provides a theoretical elucidation of various
misconceptions surrounding this approach and offers experimental evidence of
its shortcomings. Finally, it presents valuable insights to promote best
practices when employing Riemannian geometry within robot learning
applications.
( 2
min )
In this study, we present a graph neural network-based learning approach
using an autoencoder setup to derive low-dimensional variables from features
observed in experimental crystal structures. These variables are then biased in
enhanced sampling to observe state-to-state transitions and reliable
thermodynamic weights. Our approach uses simple convolution and pooling
methods. To verify the effectiveness of our protocol, we examined the
nucleation of various allotropes and polymorphs of iron and glycine from their
molten states. Our graph latent variables when biased in well-tempered
metadynamics consistently show transitions between states and achieve accurate
free energy calculations in agreement with experiments, both of which are
indicators of dependable sampling. This underscores the strength and promise of
our graph neural net variables for improved sampling. The protocol shown here
should be applicable for other systems and with other sampling methods.
( 2
min )
For over two decades, detecting rare events has been a challenging task among
researchers in the data mining and machine learning domain. Real-life problems
inspire researchers to navigate and further improve data processing and
algorithmic approaches to achieve effective and computationally efficient
methods for imbalanced learning. In this paper, we have collected and reviewed
258 peer-reviewed papers from archival journals and conference papers in an
attempt to provide an in-depth review of various approaches in imbalanced
learning from technical and application perspectives. This work aims to provide
a structured review of methods used to address the problem of imbalanced data
in various domains and create a general guideline for researchers in academia
or industry who want to dive into the broad field of machine learning using
large-scale imbalanced data.
( 2
min )
This paper discusses predictive performance and processes undertaken on
flight pricing data utilizing r2(r-square) and RMSE that leverages a large
dataset, originally from Expedia.com, consisting of approximately 20 million
records or 4.68 gigabytes. The project aims to determine the best models usable
in the real world to predict airline ticket fares for non-stop flights across
the US. Therefore, good generalization capability and optimized processing
times are important measures for the model.
We will discover key business insights utilizing feature importance and
discuss the process and tools used for our analysis. Four regression machine
learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision
Tree, and Factorization Machines utilizing Cross Validator and Training
Validator functions for assessing performance and generalization capability.
( 2
min )
Matrix-variate distributions are a recent addition to the model-based
clustering field, thereby making it possible to analyze data in matrix form
with complex structure such as images and time series. Due to its recent
appearance, there is limited literature on matrix-variate data, with even less
on dealing with outliers in these models. An approach for clustering
matrix-variate normal data with outliers is discussed. The approach, which uses
the distribution of subset log-likelihoods, extends the OCLUST algorithm to
matrix-variate normal data and uses an iterative approach to detect and trim
outliers.
( 2
min )
This study proposes an interpretable neural network-based non-proportional
odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from
conventional approaches to ordinal regression with non-proportional models in
several ways: (1) N$^3$POM is designed to directly handle continuous responses,
whereas standard methods typically treat de facto ordered continuous variables
as discrete, (2) instead of estimating response-dependent finite coefficients
of linear models from discrete responses as is done in conventional approaches,
we train a non-linear neural network to serve as a coefficient function. Thanks
to the neural network, N$^3$POM offers flexibility while preserving the
interpretability of conventional ordinal regression. We establish a sufficient
condition under which the predicted conditional cumulative probability locally
satisfies the monotonicity constraint over a user-specified region in the
covariate space. Additionally, we provide a monotonicity-preserving stochastic
(MPS) algorithm for effectively training the neural network. We apply N$^3$POM
to several real-world datasets.
( 2
min )
Neural ordinary differential equations (neural ODEs) are a popular family of
continuous-depth deep learning models. In this work, we consider a large family
of parameterized ODEs with continuous-in-time parameters, which include
time-dependent neural ODEs. We derive a generalization bound for this class by
a Lipschitz-based argument. By leveraging the analogy between neural ODEs and
deep residual networks, our approach yields in particular a generalization
bound for a class of deep residual networks. The bound involves the magnitude
of the difference between successive weight matrices. We illustrate numerically
how this quantity affects the generalization capability of neural networks.
( 2
min )
Hierarchical time series are common in several applied fields. The forecasts
for these time series are required to be coherent, that is, to satisfy the
constraints given by the hierarchy. The most popular technique to enforce
coherence is called reconciliation, which adjusts the base forecasts computed
for each time series. However, recent works on probabilistic reconciliation
present several limitations. In this paper, we propose a new approach based on
conditioning to reconcile any type of forecast distribution. We then introduce
a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample
from the reconciled distribution. It can be used for any base forecast
distribution: discrete, continuous, or in the form of samples, providing a
major speedup compared to the current methods. Experiments on several temporal
hierarchies show a significant improvement over base probabilistic forecasts.
( 2
min )
Maximum-type statistics of certain functions of the sample covariance matrix
of high-dimensional vector time series are studied to statistically confirm or
reject the null hypothesis that a data set has been collected under normal
conditions. The approach generalizes the case of the maximal deviation of the
sample autocovariances function from its assumed values. Within a linear time
series framework it is shown that Gumbel-type extreme value asymptotics holds
true. As applications we discuss long-only mimimal-variance portfolio
optimization and subportfolio analysis with respect to idiosyncratic risks, ETF
index tracking by sparse tracking portfolios, convolutional deep learners for
image analysis and the analysis of array-of-sensors data.
( 2
min )
We consider the problem of model selection in a high-dimensional sparse
linear regression model under the differential privacy framework. In
particular, we consider the problem of differentially private best subset
selection and study its utility guarantee. We adopt the well-known exponential
mechanism for selecting the best model, and under a certain margin condition,
we establish its strong model recovery property. However, the exponential
search space of the exponential mechanism poses a serious computational
bottleneck. To overcome this challenge, we propose a Metropolis-Hastings
algorithm for the sampling step and establish its polynomial mixing time to its
stationary distribution in the problem parameters $n,p$, and $s$. Furthermore,
we also establish approximate differential privacy for the final estimates of
the Metropolis-Hastings random walk using its mixing property. Finally, we also
perform some illustrative simulations that echo the theoretical findings of our
main results.
( 2
min )
Transformers pretrained on diverse tasks exhibit remarkable in-context
learning (ICL) capabilities, enabling them to solve unseen tasks solely based
on input contexts without adjusting model parameters. In this paper, we study
ICL in one of its simplest setups: pretraining a linearly parameterized
single-layer linear attention model for linear regression with a Gaussian
prior. We establish a statistical task complexity bound for the attention model
pretraining, showing that effective pretraining only requires a small number of
independent tasks. Furthermore, we prove that the pretrained model closely
matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by
achieving nearly Bayes optimal risk on unseen tasks under a fixed context
length. These theoretical findings complement prior experimental research and
shed light on the statistical foundations of ICL.
( 2
min )
Obtaining continuously updated predictions is a major challenge for
personalised medicine. Leveraging combinations of parametric regressions and
machine learning approaches, the personalised online super learner (POSL) can
achieve such dynamic and personalised predictions. We adapt POSL to predict a
repeated continuous outcome dynamically and propose a new way to validate such
personalised or dynamic prediction models. We illustrate its performance by
predicting the convection volume of patients undergoing hemodiafiltration. POSL
outperformed its candidate learners with respect to median absolute error,
calibration-in-the-large, discrimination, and net benefit. We finally discuss
the choices and challenges underlying the use of POSL.
( 2
min )
Companies, more often, pay attention to automation and innovation over proficiency and productivity. However, firms can maintain a balance between both due to the extensive usage of AI and data science programs. Here are the stats that show the impact of AI and data science in diverse sectors: Applications of AI and data science have… Read More »Future of AI and data science – How to secure a bright career
The post Future of AI and data science – How to secure a bright career appeared first on Data Science Central.
( 21
min )
At SHoP Architects, a New York City-based architectural firm, Mengyi Fan and her team aim to inspire industry professionals to create visual masterpieces by incorporating emerging technologies. Fan, the director of visualization at SHoP, has expertise that spans the fields of architectural visualization and design. She takes a definitive, novel and enduring approach to designing Read article >
( 6
min )
Posted by Nicholas Rubin, Senior Research Scientist, and Ryan Babbush, Head of Quantum Algorithms, Quantum AI Team
If you’ve paid attention to the quantum computing space, you’ve heard the claim that in the future, quantum computers will solve certain problems exponentially more efficiently than classical computers can. They have the potential to transform many industries, from pharmaceuticals to energy.
For the most part, these claims have rested on arguments about the asymptotic scaling of algorithms as the problem size approaches infinity, but this tells us very little about the practical performance of quantum computers for finite-sized problems. We want to be more concrete: Exactly which problems are quantum computers more suited to tackle than their classical counterparts, an…
( 94
min )
At one of the U.K.’s largest technology festivals, top enterprises and startups are this week highlighting their latest innovations, hosting workshops and celebrating the growing tech ecosystem based in the country’s southwest. The Bristol Technology Festival today showcased the work of nine startups that recently participated in a challenge hosted by Digital Catapult — the Read article >
( 6
min )
Put the pedal to the metal this GFN Thursday as Forza Motorsport leads 23 new games in the cloud. Plus, Acer’s Predator Connect 6E is the newest addition to the GeForce NOW Recommended program, with easy cloud gaming quality-of-service (QoS) settings built in to give Ultimate members the best streaming experience. No Breaks, No Limits, Read article >
( 6
min )
These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication. Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities […]
The post Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets appeared first on Microsoft Research.
( 10
min )
What is the optimal framework and configuration for hosting large language models (LLMs) for text-generating generative AI applications? Despite the abundance of options for serving LLMs, this is a hard question to answer due to the size of the models, varying model architectures, performance requirements of applications, and more. The Amazon SageMaker Large Model Inference […]
( 13
min )
In this post, we show how to index information stored in websites and use the intelligent search in Amazon Kendra to search for answers from content stored in internal and external websites. In addition, the ML-powered intelligent search can accurately get answers for your questions from unstructured documents with natural language narrative content, for which keyword search is not very effective.
( 7
min )
Research Focus: Principal researcher Lester Mackey recognized for pioneering statistical and ML techniques; Pareto frontiers in neural feature learning; structural inequality in the influencer industry; new research on cardinality estimation.
The post Research Focus: Week of October 9, 2023 appeared first on Microsoft Research.
( 9
min )
Developers have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations. NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it Read article >
( 6
min )
Gartner predicts blockchain’s economic impact to reach $176 billion by 2025 and $3.1 trillion by 2030. The AI software market is expected to reach $134.8 billion by 2025. Blockchain and AI benefit businesses. AI models process data, extract insights, and make decisions. Blockchain ensures data integrity and trust among participants. Read on to discover the… Read More »How does combining blockchain and AI create new business opportunities?
The post How does combining blockchain and AI create new business opportunities? appeared first on Data Science Central.
( 22
min )
In the contemporary digital landscape, data has emerged as a critical asset for organizations aiming to make informed decisions and foster innovation. Data analytics can unlock a treasure trove of insights, driving competitive advantage and operational excellence by leveraging the vast amounts of data generated every second. As a consequence, the demand for skilled professionals… Read More »Understanding the difference: Data analyst, data scientist, and data engineer
The post Understanding the difference: Data analyst, data scientist, and data engineer appeared first on Data Science Central.
( 24
min )
I’ve been in this industry for over 40 years (yes, I just started in the data and analytics industry when I was 11), and I have NEVER seen anything like Artificial Intelligence (AI) and Generative AI (GenAI) capture the attention of CEOs (and the dystopic fear of everyone else). Is AI a game-changer? Definitely! Will… Read More »11 Questions Every CEO Should Ask about AI / Generative AI
The post 11 Questions Every CEO Should Ask about AI / Generative AI appeared first on Data Science Central.
( 23
min )
Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document […]
( 7
min )
Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need […]
( 11
min )
In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.
( 12
min )
Generative AI is helping creatives across many industries bring ideas to life at unprecedented speed. This technology will be on display at Adobe MAX, running through Thursday, Oct. 12, in person and virtually.
( 9
min )
Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a […]
( 14
min )
According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it […]
( 7
min )
A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users. In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research […]
( 11
min )
Posted by Sagar M. Waghmare, Senior Software Engineer, and Kimberly Wilber, Software Engineer, Google Research, Perception Team
As most people navigate their everyday world, they process visual input from the environment using an eye-level perspective. Unlike robots and self-driving cars, people don't have any "out-of-body" sensors to help guide them. Instead, a person’s sensory input is completely "egocentric", or "from the self." This also applies to new technologies that understand the world around us from a human-like perspective, e.g., robots navigating through unknown buildings, AR glasses that highlight objects, or assistive technology to help people run independently.
In computer vision, scene understanding is the subfield that studies how visible objects relate to the sce…
( 93
min )
This cutting-edge area of AI focuses on building models that can create original material, including music, images, text, and even entire virtual worlds.
The post Revolutionizing business: A look at generative AI’s real-world impact appeared first on Data Science Central.
( 20
min )
We use the maximum a posteriori estimation principle for learning
representations distributed on the unit sphere. We propose to use the angular
Gaussian distribution, which corresponds to a Gaussian projected on the
unit-sphere and derive the associated loss function. We also consider the von
Mises-Fisher distribution, which is the conditional of a Gaussian in the
unit-sphere. The learned representations are pushed toward fixed directions,
which are the prior means of the Gaussians; allowing for a learning strategy
that is resilient to data drift. This makes it suitable for online continual
learning, which is the problem of training neural networks on a continuous data
stream, where multiple classification tasks are presented sequentially so that
data from past tasks are no longer accessible, and data from the current task
can be seen only once. To address this challenging scenario, we propose a
memory-based representation learning technique equipped with our new loss
functions. Our approach does not require negative data or knowledge of task
boundaries and performs well with smaller batch sizes while being
computationally efficient. We demonstrate with extensive experiments that the
proposed method outperforms the current state-of-the-art methods on both
standard evaluation scenarios and realistic scenarios with blurry task
boundaries. For reproducibility, we use the same training pipeline for every
compared method and share the code at https://t.ly/SQTj.
( 3
min )
Markov Decision Processes (MDPs) are a formal framework for modeling and
solving sequential decision-making problems. In finite-time horizons such
problems are relevant for instance for optimal stopping or specific supply
chain problems, but also in the training of large language models. In contrast
to infinite horizon MDPs optimal policies are not stationary, policies must be
learned for every single epoch. In practice all parameters are often trained
simultaneously, ignoring the inherent structure suggested by dynamic
programming. This paper introduces a combination of dynamic programming and
policy gradient called dynamic policy gradient, where the parameters are
trained backwards in time. For the tabular softmax parametrisation we carry out
the convergence analysis for simultaneous and dynamic policy gradient towards
global optima, both in the exact and sampled gradient settings without
regularisation. It turns out that the use of dynamic policy gradient training
much better exploits the structure of finite-time problems which is reflected
in improved convergence bounds.
( 2
min )
Detecting and discovering new gene interactions based on known gene
expressions and gene interaction data presents a significant challenge. Various
statistical and deep learning methods have attempted to tackle this challenge
by leveraging the topological structure of gene interactions and gene
expression patterns to predict novel gene interactions. In contrast, some
approaches have focused exclusively on utilizing gene expression profiles. In
this context, we introduce GENER, a parallel-layer deep learning network
designed exclusively for the identification of gene-gene relationships using
gene expression data. We conducted two training experiments and compared the
performance of our network with that of existing statistical and deep learning
approaches. Notably, our model achieved an average AUROC score of 0.834 on the
combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting
gene-gene interactions.
( 2
min )
We propose a new gradient descent algorithm with added stochastic terms for
finding the global optimizers of nonconvex optimization problems. A key
component in the algorithm is the adaptive tuning of the randomness based on
the value of the objective function. In the language of simulated annealing,
the temperature is state-dependent. With this, we prove the global convergence
of the algorithm with an algebraic rate both in probability and in the
parameter space. This is a significant improvement over the classical rate from
using a more straightforward control of the noise term. The convergence proof
is based on the actual discrete setup of the algorithm, not just its continuous
limit as often done in the literature. We also present several numerical
examples to demonstrate the efficiency and robustness of the algorithm for
reasonably complex objective functions.
( 2
min )
An extension of Transformers is proposed that enables explicit relational
reasoning through a novel module called the Abstractor. At the core of the
Abstractor is a variant of attention called relational cross-attention. The
approach is motivated by an architectural inductive bias for relational
learning that disentangles relational information from extraneous features
about individual objects. This enables explicit relational reasoning,
supporting abstraction and generalization from limited data. The Abstractor is
first evaluated on simple discriminative relational tasks and compared to
existing relational architectures. Next, the Abstractor is evaluated on purely
relational sequence-to-sequence tasks, where dramatic improvements are seen in
sample efficiency compared to standard Transformers. Finally, Abstractors are
evaluated on a collection of tasks based on mathematical problem solving, where
modest but consistent improvements in performance and sample efficiency are
observed.
( 2
min )
In this paper, we introduce a novel class of graphical models for
representing time lag specific causal relationships and independencies of
multivariate time series with unobserved confounders. We completely
characterize these graphs and show that they constitute proper subsets of the
currently employed model classes. As we show, from the novel graphs one can
thus draw stronger causal inferences -- without additional assumptions. We
further introduce a graphical representation of Markov equivalence classes of
the novel graphs. This graphical representation contains more causal knowledge
than what current state-of-the-art causal discovery algorithms learn.
( 2
min )
We propose a graphical structure for structural equation models that is
stable under marginalization under linearity and Gaussianity assumptions. We
show that computing the maximum likelihood estimation of this model is
equivalent to training a neural network. We implement a GPU-based algorithm
that computes the maximum likelihood estimation of these models.
( 2
min )
Neural networks have shown remarkable performance in computer vision, but
their deployment in numerous scientific and technical fields is challenging due
to their black-box nature. Scientists and practitioners need to evaluate the
reliability of a decision, i.e., to know simultaneously if a model relies on
the relevant features and whether these features are robust to image
corruptions. Existing attribution methods aim to provide human-understandable
explanations by highlighting important regions in the image domain, but fail to
fully characterize a decision process's reliability. To bridge this gap, we
introduce the Wavelet sCale Attribution Method (WCAM), a generalization of
attribution from the pixel domain to the space-scale domain using wavelet
transforms. Attribution in the wavelet domain reveals where {\it and} on what
scales the model focuses, thus enabling us to assess whether a decision is
reliable.
( 3
min )
In this paper, we introduce a novel class of graphical models for
representing time lag specific causal relationships and independencies of
multivariate time series with unobserved confounders. We completely
characterize these graphs and show that they constitute proper subsets of the
currently employed model classes. As we show, from the novel graphs one can
thus draw stronger causal inferences -- without additional assumptions. We
further introduce a graphical representation of Markov equivalence classes of
the novel graphs. This graphical representation contains more causal knowledge
than what current state-of-the-art causal discovery algorithms learn.
( 2
min )
We identify hidden layers inside a DNN with group actions on the data space,
and formulate the DNN as a dual voice transform with respect to Koopman
operator, a linear representation of the group action. Based on the group
theoretic arguments, particularly by using Schur's lemma, we show a simple
proof of the universality of those DNNs.
( 2
min )
We study the problem of training a flow-based generative model, parametrized
by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture.
We provide a sharp end-to-end analysis of the problem. First, we provide a
tight closed-form characterization of the learnt velocity field, when
parametrized by a shallow denoising auto-encoder trained on a finite number $n$
of samples from the target distribution. Building on this analysis, we provide
a sharp description of the corresponding generative flow, which pushes the base
Gaussian density forward to an approximation of the target density. In
particular, we provide closed-form formulae for the distance between the mean
of the generated mixture and the mean of the target mixture, which we show
decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact
Bayes-optimal.
( 2
min )
When artificial neural networks have demonstrated exceptional practical
success in a variety of domains, investigations into their theoretical
characteristics, such as their approximation power, statistical properties, and
generalization performance, have concurrently made significant strides. In this
paper, we construct a novel theory for understanding the effectiveness of
neural networks, which offers a perspective distinct from prior research.
Specifically, we explore the rationale underlying a common practice during the
construction of neural network models: sample splitting. Our findings indicate
that the optimal hyperparameters derived from sample splitting can enable a
neural network model that asymptotically minimizes the prediction risk. We
conduct extensive experiments across different application scenarios and
network architectures, and the results manifest our theory's effectiveness.
( 2
min )
Markov Decision Processes (MDPs) are a formal framework for modeling and
solving sequential decision-making problems. In finite-time horizons such
problems are relevant for instance for optimal stopping or specific supply
chain problems, but also in the training of large language models. In contrast
to infinite horizon MDPs optimal policies are not stationary, policies must be
learned for every single epoch. In practice all parameters are often trained
simultaneously, ignoring the inherent structure suggested by dynamic
programming. This paper introduces a combination of dynamic programming and
policy gradient called dynamic policy gradient, where the parameters are
trained backwards in time. For the tabular softmax parametrisation we carry out
the convergence analysis for simultaneous and dynamic policy gradient towards
global optima, both in the exact and sampled gradient settings without
regularisation. It turns out that the use of dynamic policy gradient training
much better exploits the structure of finite-time problems which is reflected
in improved convergence bounds.
( 2
min )
We propose conditional flows of the maximum mean discrepancy (MMD) with the
negative distance kernel for posterior sampling and conditional generative
modeling. This MMD, which is also known as energy distance, has several
advantageous properties like efficient computation via slicing and sorting. We
approximate the joint distribution of the ground truth and the observations
using discrete Wasserstein gradient flows and establish an error bound for the
posterior distributions. Further, we prove that our particle flow is indeed a
Wasserstein gradient flow of an appropriate functional. The power of our method
is demonstrated by numerical examples including conditional image generation
and inverse problems like superresolution, inpainting and computed tomography
in low-dose and limited-angle settings.
( 2
min )
In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.
( 13
min )
In this post, we provide an overview of popular multimodality models. We also demonstrate how to deploy these pre-trained models on Amazon SageMaker. Furthermore, we discuss the diverse applications of these models, focusing particularly on several real-world scenarios, such as zero-shot tag and attribution generation for ecommerce and automatic prompt generation from images.
( 13
min )
A research team is aiming to shake up the status quo for earthquake models. Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting. Dubbed RECAST, the model can use larger datasets and Read article >
( 6
min )
A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.
The post Efficient and hardware-friendly neural architecture search with SpaceEvo appeared first on Microsoft Research.
( 10
min )
In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.
( 8
min )
Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]
( 7
min )
Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.
( 11
min )
HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks.
The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.
( 10
min )
Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research.
The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.
( 29
min )
Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >
( 7
min )
October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >
( 9
min )
For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >
( 6
min )
This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year!
The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.
( 7
min )
Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]
( 11
min )
Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]
( 9
min )
Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]
( 7
min )
One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.
( 20
min )
In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries
The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.
( 21
min )
I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford. Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to?
The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.
( 19
min )
Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.
( 8
min )
A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities.
The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.
( 10
min )
With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >
( 6
min )
Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.
( 8
min )
No content preview
( 1
min )
Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]
( 11
min )
A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]
( 10
min )
In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]
( 9
min )
This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]
( 9
min )
By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )